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SUMMARY 

The nucleotide sequence determination of the genome of the Beaudette strain of the 
coronavirus avian infectious bronchitis virus (IBV) has been completed. The complete 
sequence has been obtained from 17 overlapping cDNA clones, the 5 / -most of which 
contains the leader sequence (as determined by direct sequencing of the genome) and 
the 3'-most of which contains the poly(A) tail. Approximately 8 kilobases at the 3' end 
of this sequence have already been published. These contain the sequences of mRNAs 
A to E within which are the genes for the spike, the membrane and the nucleocapsid 
polypeptides: the main structural components of the virion. The remainder of the 
sequence, equivalent to the ‘unique’ region of mRNA F, is some 20 kilobases in length 
and is thought to code for a polymerase or polymerases which are involved in the 
replication of the genome and the production of the subgenomic messenger RNAs. 

This sequence contains two large open reading frames, potentially coding for 
polypeptides of molecular weights 441000 and 300000. Unlike other large open 
reading frames in the virus, the 300000 open reading frame appears to have no 
subgenomic RNA associated with it which would allow it to be at the 5' end of an 
mRNA species. Because of this, and because of the characteristics of the sequence in 
the region immediately upstream of its start codon, other mechanisms of translation, 
such as ribosome slippage, must be postulated. 

INTRODUCTION 

Avian infectious bronchitis virus (IBV) is the type species of the family Coronaviridae 
(Siddell et al. , 1983a). Coronaviruses are enveloped, pleomorphic particles with a distinctive 
‘corona’ of club-shaped surface projections, and a large single-stranded RNA genome of positive 
polarity (Siddell et al ., 19836). In infected cells, in addition to genome-sized RNA, a number of 
subgenomic RNAs can be detected which have a common 3' terminus, but extend for different 
lengths in the 5' direction, forming a nested set (Stern & Kennedy, 1980a, 6; Leibowitz et al., 
1981). In the case of IBV these are designated mRNAs A to F, mRNA A being the smallest and 
mRNA F being of genome length. In vitro translation studies have demonstrated that mRNAs 
A, C and E code for the nucleocapsid polypeptide, the membrane polypeptide and the precursor 
polypeptide to the spike or surface projection respectively (Stern & Sefton, 1984). These three 
polypeptides form the three known structural proteins of coronavirus virions (Cavanagh, 1981). 
Sequencing of cDNA clones derived from IBV genomic RNA has shown that, in the case of 
mRNAs A, C and E, only the 5' region of each mRNA which is not present in the next smallest 
mRNA is translated (Boursnell et al., 1985a, 1984; Binns et al ., 19856). This region is often 
referred to, for convenience, as the ‘unique’ region of the particular mRNA. For mRNAs B and 
D the situation is more complicated in that each mRNA has more than one open reading frame 
(ORF) and also has ORFs overlapping the next smallest mRNA (Boursnell & Brown, 1984; 
Boursnell et al., 19856). 

The genome of IBV is infectious (Lomniczi, 1977) indicating that it has a messenger function. 
There is also no evidence for a virion-associated RNA polymerase (Schochetman et al., 1977). 
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On entry into the cell therefore the virion RNA probably codes for a polymerase, the gene for 
which must lie in the large 5" region of the genome, the ‘unique’ region of mRNA F, which does 
not contain the genes for the structural polypeptides. This polymerase would then be used to 
synthesize a negative-stranded template. The negative strand could then be used by another 
polymerase, or a modified form of the same polymerase, to produce the subgenomic mRNAs 
and virion RNA. Both the negative strand and two distinct polymerase activities have been 
detected in cells infected with the coronavirus mouse hepatitis virus (MHV) (Lai et al ., 1982; 
Brayton el al ., 1982). Translation of MHV virion RNA in reticulocyte lysates produced three 
structurally related polypeptides of molecular weights greater than 200000 (200K) (Leibowitz et 
al ., 1982). 

In this paper we present the nucleotide sequence, obtained from cDN A clones, of the ‘unique’ 
region of mRNA F, the genome-sized mRNA. The sequence of approximately 8 kilobases from 
the y end of the genome, containing the genes for the major structural polypeptides, has already 
been published (Boursnell & Brown, 1984; Boursnell et al ., 1984, 1985a, b ; Binns et ai , 19856). 
The 20500 bases of sequence reported here complete the sequence of the IBV genome, which is, 
as far as we are aware, the first complete sequence of a coronavirus and the largest RNA virus 
sequenced to date. 


METHODS 

cDNA cloning. Seventeen cDNA clones covering the 3'-most 27 569 kb of the genome have been obtained. These 
are shown in Fig. 1. They have been derived from RNA isolated from gradient-purified virus of the Beaudette 
strain (Beaudette & Hudson, 1937; Brown & Boursnell, 1984). cDNA has been obtained by three methods: 
oligo(dT) priming (Brown & Boursnell, 1984), priming with specific oligonucleotides (Boursnell et al 1984) and 
random priming with calf thymus DNA oligonucleotides (Binns et al., 1985 a). The Southern blotting technique 
was used to identify overlapping clones (Southern, 1975). Specific cDNA clones were identified using "prime-cut* 
probes. These are made by synthesizing labelled DNA from selected Ml3 clones using the normal sequencing 
primer, cutting with a restriction enzyme, and eluting the labelled, single-stranded probe from denaturing 
acrylamide gels (Biggin et al. , 1984). 

Subcloning for M13 sequencing . Random subclones of each cDNA clone were generated by sonication 
(Deininger, 1983) and subcloning into Smal-cut, phosphatase-treated Mt3mpl0 (Amersham). Bacterial colonies 
containing M13 with inserts were grown, transferred to nitrocellulose filters, and probed with nick-translated 
purified viral insert DNA from the cDNA clone. Single-stranded templates were prepared from Ml3 clones 
identified as viral in this way. 

DNA sequencing. Sequencing was carried out by the dideoxy method (Sanger et al., 1977; Bankier & Barrell, 
1983). [a- 35 S]dATP was used in the sequencing reactions and the products were analysed on buffer gradient gels 
(Biggin et al., 1983). Additional sequencing information was obtained by reverse sequencing (Hong, 1981). For 
regions containing compressions due to DNA secondary structure, sequencing samples were run on hot (80 °C) 
gels or gels containing 42% formamide. For some regions cytosine residues were modified by the method of 
Ambartsumyan & Mazo (1980) prior to separating on gels, to reduce GC base pairing. Deoxyinosine triphosphate 
(Bankier & Barrell, 1983) and deoxy-7-deazaguanosine triphosphate (Mizusawa et al., 1986) were used in place of 
deoxyguanosine triphosphate in some cases, again to reduce GC base pairing. For sequencing directly from the 
viral RNA the method used was essentially as described by Caton et al. (1982). 

Computer analysis of the sequence data. Sequence data were read directly into a BBC microcomputer using a 
sonic digitizer (Graf/Bar, Science Accessories Corporation) and data were analysed on a VAX 11/750 using the 
programs of Staden (1982a, b> 1984a, b). Comparisons with the National Biomedical Research Foundation 
(NBRF) protein identification resource was made using the programs SEARCH and FASTP (George et ai , 1986; 
Lipman & Pearson, 1985) and SEQHP (Kanehisa, 1982). 

RESULTS 

Selection of cDNA clones 

The majority of the cDNA clones which have been used to obtain the sequence of the ‘unique’ 
region of mRNA F were produced by a random priming method (Binns et al ., 1985 a). Clone 182 
was produced by priming with a specific oligonucleotide from existing sequence at the 5' end of 
mRNA D. Clone 227 was identified as coming from the 5' end of the genome by probing a 
random library with leader-specific probes. The randomly primed clones 217,216,204,210,205, 
220 and 249 were mapped by identifying overlaps using Southern blotting. The nine clones were 
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Fig. 1. Diagram showing the positions of all the cDNA clones used in obtaining the nucleotide 
sequence. The squares at the end of some of the clones show the positions of oligonucleotide primers 
used to prime synthesis of cDNA for adjacent clones. Above the clones are shown mRNAs A to F. 


not contiguous but formed four blocks. cDNA clones in the region of the three remaining gaps 
were obtained using specific oligonucleotide primers. Clones spanning the gaps were identified 
using either ‘prime-cut’ probes (Biggin et aL , 1984) made from M13 subclones of cDNA clones 
on either side of the gap or by using Southern blotting. Five clones, 256, 263, BP3, BPS and BP8 
were identified in this way and the overlaps confirmed by sequencing. Fig. 1 shows the positions 
of all the cDNA clones used in obtaining the complete sequence of the virus, and the positions of 
the oligonucleotide primers. 


DNA sequencing 

Fourteen cDNA clones have been sequenced to obtain the complete sequence of the ‘unique’ 
region of mRNA F, the genome-sized messenger RNA. The 20500 bases of sequence presented 
here stretch from the 5' end of the genome to an arbitrary position 190 bases 3'-wards of the end 
of the body of mRNA E. The 39 nucleotides at the very 5' end of the genome have not been 
obtained in cDNA clones from the Beaudette strain, and the sequence here is derived from 
Maxam & Gilbert (1980) sequencing of primer-extended products from Beaudette virion RNA 
(Brown et al ., 1986). Fig. 2 shows the DNA sequence obained from the cDNA clones, with a 
translation in single-letter amino acid code of the main ORFs. 

Sequence analysis 

Fig. 3 shows the positions of ORFs in this region. Most of the sequence encodes two very large 
ORFs which could code for polypeptides of predicted molecular weights 44IK and 300K. These 
two large ORFs have been designated FI and F2. 

The first large ORF, FI, is not the first ORF to occur after the homology region. At position 
131 there is an AUG codon followed by a small ORF which could code for a polypeptide of 11 
amino acids. This AUG is the first initiation codon to occur on the genome. The second 
initiation codon is at the start of FI. Both the large ORFs have a codon usage (Staden & 
McLachlan, 1982) very similar to that of the genes for the structural polypeptides S, M and N. 
The small ORF also appears to have the same codon usage, insofar as that is significant for such 
a short sequence. After the end of the small ORF the reading frame is open, in the other two 
possible frames, for a further 232 or 73 bases but the codon usage of the predicted amino acids 
for these sections of ORF is not similar to that previously found for IBV. The sequence context 
around the first AUG codon is not similar to that used by most eukaryotic mRNAs (Kozak, 
1983) in that it has a pyrimidine at position —3. The context around the second AUG on the 
other hand has a purine at — 3, in addition to a C at positions — 1 and —4, both of which mean 
that it conforms well to the consensus for functional initiation codons. 
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1 ACTTAAGATAGATATTAATATATATCTATTACACTAGCCTTGCGCTAGATTTTTAACTTAACAAAACGGACTTAAATACCTACAGCTGGTCCTCATAGGT 

flAPGHLSGFCY* 

101 GTTCCATTGCAGTGCACTTTAGTGCCCTGGATGGCACCTGGCCACCTGTCAGGTTTTTGTTATTAAAATCTTATTGTTGCTGGTATCACTGCTTGTTTTG 
201 CCGTGTCTCACTTTATACATCTGTTGCTTGGGCTACCTAGTGTCCAGCGTCCTACGGGCGTCGTGGCTGGTTCGAGTGCGAGGAACCTCTGGTTCATCTA 
301 GCGGTAGGCGGGTGTGTGGAAGTAGCACTTCAGACGTACCGGTTCTGTTGTGTGAAATACGGGGTCACCTCCCCCCACATACCTCTAAGGGCTTTTGAGC 
401 CTAGCGTTGGGCTACGTTCTCGCATAAGGTCGGCTATACGACGTTTGTAGGGGGTAGTGCCAAACAACCCCTGAGGTGACAGGTTCTGGTGGTGTTTAGT 

MASSIKQGVSPKPRDVILVSKDIP 
501 GAGCAGACATACAATAGACACTGACAACATGGCTTCAAGCCTAAAACAGGGAGTATCTCCCAAACCACGGGATGTCATTCTTGTGTCCAAAGACATCCCT 

EQLCDAIFFYTSHNPKDYAOAFAVRQKFDRSLQT 
601 GAACAACTTTGTGACGCTTTGTTTTTCTATACGTCACATAACCCTAAGGATTACGCTGATGCTTTTGCAGTTAGGCAGAAGTTTGACCGTAGTCTCCAGA 

GKQFKFETVCGLFLLKGVDKITPGUPAKyLKAT 
701 CTGGGAAACAGTTCAAATTTGAAACTGTGTGTGGTCTCTTCCTCTTGAAGGGAGTTGACAAAATAACACCTGGCGTCCCAGCAAAAGTTTTAAAAGCCAC 

SKLAOLEDIFGVSPLARKYRELLKTACQliiSLTV 
801 TTCTAAGTTGGCAGATTTAGAAGACATCTTTGGTGTCTCTCCTTTAGCGCGGAAGTACCGTGAATTGTTGAAAACAGCGTGTCAGTGGTCTCTTACTGTA 

EALDVRAQTLDEIFDPTEILWLQVAAKIHVSSnA 
901 GAAGCACTGGATGTTCGTGCACAAACTCTCGATGAAATTTTTGACCCCACTGAAATACTTTGGCTTCAGGTGGCTGCAAAAATTCATGTTTCATCTATGG 

MRRIUGEUTAKVMDALGSNLSALFQIVKQQIAR 
1001 CAATGCGCAGGCTTGTTGGAGAAGTAACTGCAAAAGTCATGGATGCTCTGGGCTCAAACTTGAGTGCTCTTTTTCAAATTGTTAAACAACAAATAGCCAG 

IFQKALAIFENVNELPQRIAALKNAFAKCARSI 
1101 AATCTTTCAAAAGGCACTGGCTATTTTTGAGAATGTGAATGAATTACCACAGCGTATTGCAGCACTTAAGATGGCTTTTGCCAAGTGTGCTAGGTCAATT 

TyyyUERTLUVKEFAGTCLASINGAUAKFFEELP 
1201 ACTGTTGTGGTTGTTGAAAGAACTCTAGTTGTTAAAGAGTTCGCAGGAACTTGTCTTGCAAGCATTAATGGTGCTGTCGCAAAATTCTTTGAAGAGTTGC 

NGFNGSKIFTTIAFFKEAAVRVVEIMIPNAPRGT 
1301 CAAACGGCTTCATGGGTTCTAAGATTTTCACAACACTTGCCTTCTTTAAAGAGGCAGCTGTGAGAGTTGTGGAGAACATACCAAATGCACCGAGAGGTAC 

KGFEUUGNAKGTQUVyRGPIRNDLTLLDQKADIP 
1401 TAAGGGATTTGAAGTTGTTGGCAATGCCAAAGGCACACAGGTAGTTGTGCGCGGCATGCGAAATGACTTAACATTGCTTGACCAAAAAGCTGATATTCCT 

vepegwsaildghlcyufrsgorfyaaplsgnfa 
1501 gttgaaccagaaggttggtctgcaattttggatggacatctttgctatgtctttaggagtggtgatcgcttttatgctgcacctctttcaggaaattttg 

LSDVHCCERVVCLSDGVTPEINDGLILAAIYSS 
1601 ctttgagtgatgttcattgctgtgagcgtgtagtctgtctatctgatggtgtaacaccggagataaatgatggactcattctagctgcaatctactcttc 

FSVSELUTALKKGEPFKFLGHKFVYAKDAAVSF 
1701 ttttagtgtctctgagcttgtaacagctcttaaaaagggtgaaccattcaagttcttgggccataaattcgtgtatgcgaaggatgcagcagtgtctttt 

tlakaatiadvirlfqsaruiaeduwssfteksf 
1801 actttagcgaaggctgccactattgcagatgtcttgaggctgtttcaatcagctcgtgtgatagcagaagatgtttggtcttcatttactgaaaagtctt 

EFWKLAYGKVRIMLEEFVKTYUCKAQMSIVILAA 
1901 ttgaattctggaagcttgcatatggaaaagtgcgcaaccttgaagaatttgtgaagacctatgtttgtaaggctcaaatgtcgattgtgattctagcagc 

\/LGEDIUfHLVSQ\/IYKlG\/LFTK\/\/DFCDKHliJK 
2001 AGTGCTTGGAGAGGACATTTGGCATCTTGTCTCACAAGTCATCTATAAATTAGGTGTTCTTTTTACTAAAGTCGTTGACTTTTGTGACAAACACTGGAAA 

GFCVQLKRAKLIVTETFCI/LKGVAQHCFQLLLDA 
2101 GGTTTTTGTGTACAGTTGAAAAGAGCTAAGCTCATTGTCACCGAAACCTTCTGTGTTTTAAAAGGAGTTGCACAGCATTGTTTTCAACTGCTGCTAGATG 

ihslyksfkkcaigrihgdllfwkggvhkivqd 
2201 caatacactctttgtacaagagttttaagaagtgtgcacttggtagaatccatggagatttgctcttctggaaaggaggtgtgcataaaattgttcaaga 

GDEIWFDAIDSVDVEDLGVVQEKSIDFEVCDDV 
2301 tggcgatgaaatatggtttgacgccattgatagtgttgatgttgaagatctgggtgttgttcaggaaaaatcgattgattttgaggtttgcgatgacgtg 

TLPENQPGHMVQIEODGKNYMFFRFKKDENIYYT 

2401 acacttccagaaaaccaacctggtcatatggttcaaatagaggatgatggtaagaactacatgttcttccgttttaaaaaggatgagaacatttattata 
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2501 CACCAATGTCTCAACTTGGTGCTATTAATGTGGTTTGCAAAGCAGGCGGTAAGACTGTCACCTTTGGAGAAACTACAGTACAAGAGATACCACCACCTGA 

VVPIKVSIECCGEPWNTIFKKAYKEPIEVDTDL 
2601 TGTCGTGCCTATT AAGGTT AGCAT AGAATGTTGTGGTGAACCATGGAAT ACGATCTTCAAGAAGGCTT ATAAAGAGCCTATAGAAGTAGATACAGACCTC 

TVEQLLSUIYEKnCDDLKLFPEAPEPPPFENUAL 
2701 ACAGTAGAACAATTGCTCTCTGTGATCTATGAGAAAATGTGTGACGACCTTAAATTGTTTCCAGAGGCACCAGAGCCTCCACCATTTGAGAATGTCGCAC 

VOKNGKDLDCIKSCHLIYRDYESDDDIEEEDAE 
2801 TTGTTGATAAGAACGGTAAAGATTTGGATTGTATAAAATCTTGCCATTTGATCTATCGTGACTATGAGAGCGATGATGACATCGAGGAGGAAGATGCTGA 

ECDTQSGEAEECDTN5ECEEEDEDTKI/LALIQD 
2901 GGAGTGTGACACAGACTCAGGTGAAGCTGAGGAGTGTGACACT AATTCAGAATGTGAAGAAGAGGATGAGGAT ACT AAAGTGTTGGCTCTT AT ACAAGAC 

PASIKYPLPIDEDYSVYNGCIVHKDALDVVNLPS 
3001 CCGGCAAGT ATT AAAT ACCCTCTGCCTCTTGATGAAGATT AT AGCGTCT AT AATGGATGT ATTGT ACACAAGGACGCTCTTGATGTTGTGAATTT ACCAT 

GEETFVVNNCFEGAVKPLPQKVVDVLGDWGEAV 
3101 CTGGTGAAGAAACTTTTGTTGTCAATAACTGTTTTGAGGGAGCTGTTAAACCACTTCCACAGAAGGTAGTTGATGTTCTTGGTGACTGGGGAGAGGCTGT 

DAQEQLCQQEPLQHTFEEPVENSTGS5KTMTEQ 
3201 TGATGCGCAAGAACAACTGTGTCAACAAGAGCCTCTGCAACAT ACCTTTGAAGAACCAGTCGAAAATTCT ACTGGTAGTTCT AAGACAATGACTGAACAA 

l/yVEOQELPVVEQDQDVVUYTPTOLEyAKETAEE 
3301 GTCGTTGTAGAAGATCAAGAACTACCTGTTGTTGAACAAGATCAGGATGTAGTTGTTTATACACCTACAGATCTTGAAGTTGCAAAAGAAACAGCAGAAG 

yDEFILIFAyPKEEyySQKDGAQIKQEPIQVyK 
3401 AGGTTGATGAGTTTATTCTCATTTTTGCTGTTCCTAAAGAAGAAGTTGTGTCCCAGAAAGATGGGGCACAGATTAAACAAGAGCCTATTCAAGTTGTTAA 

PQREKKAKKFKyKPATCEKPKFLEYKTCyGDLT 
3501 ACCACAACGTGAGAAGAAGGCTAAAAAGTTCAAAGTTAAACCAGCCACATGTGAGAAACCTAAATTTTTGGAGTATAAAACATGTGTGGGTGATTTGACT 

VyiAKALDEFKEFCIVNAANEHWTHGSGVAKAIA 
3601 GTTGTAATTGCCAAAGCATTGGATGAGTTTAAAGAGTTCTGCATTGTAAATGCTGCAAATGAGCATATGACTCATGGTAGTGGCGTTGCAAAGGCAATTG 

DFCGLDFyEYCEDYyKKHGPQQRLyTPSFyKGI 
3701 CAGACTTTTGTGGACTGGATTTTGTTGAATATTGTGAGGACTATGTTAAGAAACATGGGCCACAACAGAGACTTGTTACACCTTCGTTTGTCAAAGGCAT 

QCVNNVyGPRHGDNNLHEKLVAAYKNVLVDGVy 
3801 TCAATGTGTGAATAATGTTGTAGGACCCCGCCATGGAGACAACAACTTGCATGAGAAGCTTGTTGCTGCCTACAAGAATGTGCTTGTAGATGGCGTAGTC 

NYyyPyLSLGIFGyDFKPISIDAnREAFEGCTIRy 
3901 AATTATGTTGTGCCAGTTCTTTCATTAGGAATTTTTGGTGTAGATTTTAAAATGTCAATAGACGCAATGCGTGAAGCTTTTGAAGGTTGCACCATACGCG 

LLFSLSQEHIDYFDVTCKQKTIYLTEDGyKYRS 
4001 TTCTTTTGTTTTCTCTGAGCCAAGAACACATCGATTATTTCGATGTAACTTGCAAACAGAAGACAATTTATCTTACGGAGGATGGTGTTAAATACCGCTC 

iyLKPGOSLGQFGQVYAKNKIVFTADDyEDKEI 
4101 CATTGTTCTAAAACCTGGTGACTCATTGGGTCAATTTGGACAGGTTTATGCTAAAAACAAGATAGTTTTTACAGCCGATGATGTTGAGGACAAAGAAATT 

LYyPTTDKSILEYYGLDAQKYyiYLQTLAQKiiJNy 
*201 CTCTACGTCCCCACGACTGATAAAAGCATTCTTGAATACTATGGTTTAGATGCGCAAAAGTATGTAATATATTTGCAAACGCTTGCGCAGAAATGGAATG 

QYRDNFLILEUJRDGNCiillSSAiyLLQAAKIRFK 
4301 TCCAATATAGGGACAATTTTCTTATACTAGAGTGGCGCGATGGAAATTGTTGGATTAGTTCAGCAATAGTTCTCCTTCAAGCTGCTAAAATTAGGTTTAA 

GFLTEAWAKLLGGDPTDFVAli/CYASCTAKVGDF 
4401 AGGTTTTCT AACAGAAGCGTGGGCTAAACTGTT AGGTGGAGATCCT ACAGACTTTGTTGCCTGGTGTT ATGCAAGTTGTACTGCTAAAGT AGGTGATTTC 

SDANWLLANLAEHFDADYTNAFLKKRySCNCGIK 
4501 TCAGATGCTAATTGGCTTTTAGCGAATTTAGCAGAACATTTTGACGCAGATTACACAAATGCGTTTCTTAAGAAGCGCGTTTCGTGTAACTGTGGTATTA 

SYELRGIEACIQPVRATNLIHFKTQYSNCPTCG 
4601 AGAGCTATGAGCTTAGAGGCCTTGAAGCTTGTATTCAGCCAGTTCGGGCAACTAATCTGCTACATTTTAAGACGCAATATTCAAATTGCCCAACCTGTGG 

ANNTDEyiEASLPYLLLFATDGPATyDCDEOAy 
4701 CGCAAATAATACGGATGAAGTAATAGAAGCTTCGTTACCGTACTTATTGCTTTTTGCTACTGATGGTCCTGCTACAGTTGATTGTGATGAAGATGCTGTG 
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4801 GGGACTGTCGTGTTTGTTGGTTCTACTAATAGTGGCCATTGTTATACACAAGCTGCAGGGCAAGCTTTTGATAATCTTGCTAAAGATAGAAAATTTGGAA 


4900 



62 


M. E. G. BOURSNELL AND OTHERS 


kspyitamytrfafknetslpvakqskgksksv 

4901 AGAAGTCGCCTTACATTACTGCAATGTATACGCGATTCGCTTTTAAGAATGAAACCTCTTTGCCTGTTGCTAAACAGAGCAA&GGTAAGTCTAAGTCGGT 5000 

KEDVSNLATSSKASFDNLTDFEQUJYDSNIYESL 
5001 AAAGGAAGATGTTTCT AACCTTGCT ACTAGTTCT AAGGCCAGTTTTGATAATCTT ACTGACTTCGAACAGTGGT ATGATAGT AACATCT ATGAAAGTCTT 5100 

KVQESPDNFDKY\/SFTTKEDSKLPLTLK\/RGIKS 
5101 AAAGTGCAGGAATCACCTGATAACTTTGATAAATATGTGTCATTCACAACAAAGGAAGATTCTAAGTTGCCATTGACACTTAAGGTTAGAGGTATTAAAT 5200 

VVDFRSKDGFIYKLTPDTDENSKAPVYYPVLDA 
5201 CAGTTGTTGACTTTAGATCGAAGGATGGTTTTATTTATAAGTTAACACCTGATACTGATGAAAATTCAAAAGCACCAGTCTACTACCCAGTCTTGGACGC 5300 

ISLKAIWVEGNANFVVGHPNYYSKSLHIPTFtilE 
5301 TATTAGTCTTAAGGCAATATGGGTGGAAGGTAATGCTAACTTTGTTGTTGGTCATCCAAATTATTATAGTAAGTCTCTTCATATTCCTACTTTTTGGGAA 5400 

naenfvkmddkiggvtmglwraehlnkpnierif 

5401 AATGCTGAGAATTTTGTTAAAATGGGTGATAAAATTGGTGGTGTAACTATGGGACTTTGGCGTGCAGAACACCTTAATAAACCTAATTTGGAGAGAATTT 5500 

niakkaivgssvvttqcgkligkaatfiadkvg 

5501 TCAACATTGCTAAGAAAGCCATTGTTGGATCTAGTGTTGTTACTACACAATGCGGTAAATTAATAGGTAAAGCAGCTACATTCATTGCTGATAAAGTAGG 5600 

ggvvrnitdsikglcgitrghferknspqflkt 

5601 TGGTGGTGTAGTTCGCAATATTACAGATAGCATTAAGGGTCTTTGTGGAATTACACGAGGGCATTTTGAAAGAAAAATGTCTCCACAATTCCTAAAGACG 5700 

lnfflfyflkasuksv/vasyktvlckvvlatlli 
5701 CTTATGrTCTTTTTATTCTATTTCTTGAAGGCTAGTGTTAAGAGTGTTGTCGCTAGCTATAAGACCGTGTTATGTAAGGTGGTACTTGCTACTTTACTTA 5800 

vufvytsnpvmftgirvldflfegslcgpykdy 

5801 TAGTTTGGTTTGTCTACACAAGTAACCCAGTAATGTTTACAGGAATACGTGTGTTAGATTTTCTATTCGAGGGTTCTTTGTGTGGTCCTTATAAAGACTA 5900 

GKDSFDVLRYCADDFICRVCLHDKDSLHLYKHA 
59D1 TGGTAAAGATTCTTTTGATGTGTTACGATATTGTGCAGATGATTTTATTTGTCGTGTGTGTTTACATGACAAAGATTCACTTCATTTGTACAAACACGCT 6000 

YSUEQVYKDAASGFIFNIiJNlilLYLUFLILFyKPUA 
6001 TATAGTGTAGAGCAGGTCTATAAAGATGCAGCTTCTGGTTTTATTTTTAATTGGAATTGGCTTTATTTGGTCTTTCTAATATTATTTGTTAAACCAGTGG 6100 

GFVIICYCVKYLVLNSTVLQTGVCFLDlilFVQTV 
6101 CAGGTTTTGTTATTATTTGCTATTGTGTTAAGTATTTGGTATTGAATTCAACTGTGCTGCAAACTGGTGTTTGTTTTTTAGATTGGTTTGTACAAACAGT 6200 

FSHFNFNGAGFYFWLFYKIYIQVHHILYCKDVT 
6201 TTTTAGTCACTTTAATTTTATGGGAGCAGGGTTTTATTTCTGGCTCTTTTACAAGATATATATACAGGTGCATCATATACTGTATTGTAAGGATGTAACA 6300 

CEVCKRVARSNRQEVSVVVGGRKQIVHVYTNSGY 
6301 TGTGAAGTGTGCAAAAGGGTTGCACGCAGCAACAGGCAAGAGGTTAGCGTGGTTGTTGGTGGACGCAAGCAGATAGTGCATGTTTACACTAACTCTGGCT 6400 

NFCKRHNWYCRNCDDYGHQNTFMSPEVAGELSE 
6401 ATAACTTTTGTAAGAGACATAATTGGTATTGTAGAAATTGTGATGATTATGGTCACCAAAATACATTTATGTCTCCTGAAGTTGCTGGCGAGCTCTCTGA 6500 

KLKRHUKPTAYAYHVVDEACLVDDFVNLKYKAA 
6501 AAAGCTT AAGCGCCATGTT AAACCT ACAGCAT ACGCTT ACCACGTTGTGGATGAGGCATGCTT AGTTGATGATTTTGTCAATTT AAAAT AT AAAGCTGCA 6600 

tpgkdsassavkcfsvtdflkkavflkealkceq 

6601 ACTCCTGGTAAGGATAGTGCATCTTCAGCTGTTAAGTGTTTCAGTGTTACAGATTTCTTGAAGAAAGCTGTTTTTCTTAAGGAAGCACTGAAATGTGAAC 6700 

ISNDGFI VCNTQSAHALEEAKNAAIYYAQYLCK 
6701 AAAT ATCT AATGATGGTTTT ATAGTGTGT AAT ACACAGAGTGCTCATGCATT AGAGGAAGCAAAGAATGCAGCCATCT ATTATGCGCAATATCTGTGT AA 6800 

PILILDQALYEQLVVEPVSKSVIDKVCSILSSI 
6801 GCCAATACTTATACTTGACCAGGCACTTTATGAGCAATTAGTAGTAGAGCCTGTGTCTAAGAGTGTTATAGATAAAGTGTGTAGCATTTTGTCTAGTATA 6900 

isvdtaainykagtlrdallsitkdeeavdmaif 
6901 atatctgtagatactgcagctttaaattataaggcaggcacacttcgtgatgctctgctttctattactaaagacgaagaggccgtagatatggctat AT 7000 

chnhdvdytgdgftnvipsygidtgkltprdrg 
7001 tctgtcataatcatgatgtggattacactggtgatggttttactaatgtgataccgtcatatggtatagacactggcaagttaacacctcgtgatagagg 7100 

flinadasianlrvknappvi/wkfseliklsds 
7101 gtttttgataaatgcagatgcttctattgctaacttaagagttaaaaatgctccgccggtagtatggaagttttctgagcttattaagttgtctgacagt 7200 

clkylisatvksgvrffitksgakquiachtqkl 
7201 tgtcttaaatatttaatttcggctactgttaagtcaggtgttcgtttctttataacaaagtctggtgctaaacaagttattgcttgtcatacacagaagt 7300 
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LUEKKAGGIU5GTFKCFK5YFKWLLIFYILFTA 
7301 TGTTAGTAGAGAAAAAGGCAGGTGGTATTGTTAGCGGCACCTTTAAGTGTTTTAAGAGTTATTTTAAATGGCTCTTGATCTTTTACATACTTTTTACAGC 

CCSGYYYNEVSKSFVHPNYDVNSTLHVEGFKVI 
7401 ATGTTGTTCGGGTTATTACTATATGGAGGTGAGTAAAAGTTTTGTTCACCCCATGTATGATGTAAACTCCACACTGCATGTTGAAGGTTTTAAAGTTATA 

DKGVLRElVPEDTCFSNKFVNFDAFlilGRPYDNSR 
7501 GATAAAGGTGTTCTTAGGGAAATTGT ACCAGAAGAT ACATGTTTCTCTAAT AAATTTGTT AATTTTGATGCTTTTTGGGGCAGACCATATGAT AATAGT A 

NCPIVTAVIDGDGTWATGUPGFUSliJVrcDGVNFI 
7601 GAAACTGTCCAATTGTCACAGCTGTTATAGATGGTGATGGGACAGTAGCTACAGGTGTTCCTGGTTTTGTGTCCTGGGTTATGGATGGTGTTATGTTTAT 

HflTQTERKPWYIPTliIFNREIVGYTQDSIITEGS 
7701 ACATATGACACAGACTGAGAGAAAACCGTGGT ACATTCCT ACTTGGTTT AAT AGAGAAATTGTCGGTT ACACTCAGGATTCAATT ATT ACTGAGGGT AGT 

fytsialfsarclyltasntpqlycfngondapg 

7801 TTTTATACATCTATAGCGTTATTTTCCGCTAGGTGTTTATATTTAACAGCCAGCAATACACCTCAATTGTATTGCTTTAATGGTGATAATGATGCACCTG 

alpfgsiiphrvyfqpngvrlivpqqilhtpyv 

7901 GGGCTTTGCCATTTGGTAGTATTATTCCTCATAGAGTTTATTTCCAACCCAATGGTGTTAGGCTTATAGTTCCACAACAAATACTGCACACACCCTACGT 

VKFUSOSYCRGSUCEYTRPGYCVSLNPQIliVLFN 
B001 AGT AAAGTTTGTATCAGACAGCT ATTGT AGGGGT AGTGTGTGTGAGT ACACT AGACCAGGTT ACTGTGTGTCATT AAACCCACAATGGGTTTTGTTT AAT 

DEYTSKPGVFCGSTVRELMFSNVSTFFTGyNPNI 
8101 GACGAATACACAAGT AAACCCGGTGTTTTCTGTGGTTCT ACTGTTAGAGAACTT ATGTTTAGTATGGTT AGT ACATTCTTT ACTGGTGTT AACCCCAATA 

ywqlatmflilvvvvlifamvikfqgvfkayat 
8201 tctatatgcaattagcaactatgtttttaatactagttgttgttgtattaatctttgcaatggttataaagtttcaaggtgtttttaaagcttatgcaac 

TVFITMLUWVINAFILCVHSYNSVLAVILLVLY 
8301 CACTGTTTTTATAACAATGTTAGTTTGGGTAATTAACGCATTTATTTTGTGTGTACATAGTTACAACAGTGTTTTAGCTGTTATATTACTAGTACTCTAT 

CY A5LVT SRNTUI IMHCULVF TFGL IVPTliILACC 
8401 TGCTATGCGTCATTGGTTACAAGTCGCAATACTGTTATAATAATGCATTGTTGGCTTGTTTTTACCTTTGGTTTAATAGTACCCACATGGTTGGCTTGTT 

YLGFIIYNYTPLFlbJCYGTTKNTRKLYDGNEFV 
8501 GCTACCTGGGATTTATTATTTATATGTATACACCGTTGTTTTTATGGTGTTATGGTACTACAAAAAACACTCGTAAGCTGTATGATGGCAATGAGTTTGT 

GNYDLAAKSTFUIRGSEFt/KLTNEIGDKFEAYL 
8601 TGGTAATTATGATCTTGCTGCGAAGAGCACTTTTGTTATTCGCGGCTCTGAATTTGTTAAGCTTACTAATGAGATAGGTGATAAATTTGAGGCCTACCTT 

SAYARLKYYSGTGSEQDYLQACRAULAYALDQYR 
8701 TCAGCGT ATGCT AGATTAAAGTACT ATTCAGGCACTGGCAGTGAACAAGATTATTTGCAAGCTTGTCGTGCATGGTTAGCTT ATGCTTTGGACCAATATA 

NSGVEIVYTPPRYSIGVSRLQSGFKKLVSPSSA 
8801 GAAAT AGTGGTGTGGAAATTGTTTAT ACTCCGCCACGTT ACTCTATTGGTGTT AGT AGATT ACAATCTGGTTTT AAGAAACTGGTTTCTCCT AGT AGTGC 

VEKCIVSVSYRGNNLNGLliJLGDTIYCPRHVLGK 
8901 TGTTGAAAAGTGCATTGTTAGTGTCTCTTATAGAGGTAATAATCTTAATGGACTGTGGCTAGGTGACACTATCTACTGTCCTCGTCATGTATTGGGTAAG 

FSGDQli/NDy/LNLANNHEFEVTTQHGVTLNVVSRR 
9001 TTTTCAGGTGACCAATGGAATGATGTACTTAATCTTGCTAATAATCATGAGTTTGAAGTTACAACTCAACATGGTGTTACTTTGAATGTTGTCAGTAGGC 

LKGAVLllQTAi/ANAETPKYKFIKANCGOSFTI 
9101 GTTTAAAAGGTGCAGTTTTAATTTTACAAACTGCTGTTGCTAATGCTGAAACTCCAAAGTATAAGTTTATTAAAGCTAATTGTGGTGATAGTTTCACTAT 

ACAYGGTVVGLYPVTMRSNGTIRASFLAGACGS 
9201 AGCTTGTGCTTATGGTGGTACAGTTGTAGGACTCTACCCTGTTACTATGCGTTCTAATGGTACTATTAGAGCATCTTTTCTTGCGGGAGCCTGTGGTTCA 

VGFNIEKGVVNFFYMHHLELPNALHTGTDLMGEF 
9301 GTTGGTTTTAATATAGAAAAGGGTGTAGTTAATTTCTTTTATATGCACCATCTTGAGTTACCTAATGCATTACACACTGGAACTGACCTAATGGGTGAAT 

YGGYVDEEVAQRVPPDNLVTNNIVAWLYAAIIS 
9401 TCTATGGTGGTTATGTTGATGAAGAGGTTGCACAAAGAGTGCCACCAGATAATTTAGTTACTAACAATATTGTAGCATGGCTCTATGCGGCAATTATTAG 

VKESSFSLPKWLESTTVSVDOYNKWAGDNGFTP 
9501 TGTTAAGGAGAGTAGTTTCTCGCTGCCTAAATGGTTGGAGAGTACTACTGTTAGTGTTGATGATTATAATAAGTGGGCTGGTGACAATGGTTTTACACCA 

FSTSTAITKLSAITGVDVCKLLRTlPIVKNSQlilGG 
9601 TTTTCTACTAGTACCGCTATTACTAAATTAAGTGCTATAACTGGAGTTGATGTTTGTAAGCTCCTTCGCACTATTATGGTAAAAAATAGCCAGTGGGGTG 
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DPILGQYNFEOEITPESVFNQIGGVRIQSSFVR 
9701 GTGACCCCATTTTAGGGCAATATAATTTTGAAGATGAATTGACACCGGAGTCTGTATTTAATCAGATTGGTGGTGTTAGATTACAATCTTCTTTTGTAAG 

KATSWFUfSRCVLACFLFVLCAIVLFTAVPlKFY 

9801 AAAAGCTACATCTTGGTTTTGGAGTAGATGTGTGTTAGCTTGCTTCTTATTTGTGTTGTGTGCTATTGTCTTGTTTACGGCAGTGCCACTTAAATTTTAT 

vyaauillmaulfisftvkhvpiayrdtfllptli 

9901 GTATATGCAGCTGTTATTTTGTTAATGGCTGTACTTTTTATTTCTTTTACTGTTAAACATGTTATGGCATATATGGATACTTTTCTATTGCCAACATTGA 

TVIIGVCAEVPFIYNTLISQVVIFLSQliIYOPVU 
10001 TTACAGTTATTATTGGAGTTTGTGCTGAAGTGCCTTTCATCTACAATACTCTAATTAGTCAAGTTGTTATTTTCTTAAGTCAATGGTATGACCCAGTAGT 

FDTWUPyJWFLPLVLYTAFKCVQGCYMNSFNTSL 
10101 CTTTGATACTATGGTACCATGGATGTTCTTGCCACTAGTGTTGTATACTGCTTTTAAGTGTGTACAAGGTTGCTATATGAATTCTTTCAATACTTCTTTG 

LPILYQFt/KtGFVIYTSSNTLTAYTEGNUlElFFEL 
10201 TTAATGCTGTATCAGTTTGTGAAGTTAGGTTTTGTTATTTACACCTCTTCTAATACTCTTACTGCATACACAGAAGGTAATTGGGAGTTATTCTTCGAGT 

VHTTULANVSSNSLIGIFVFKCAKWPILYYCNAT 
10301 TGGTGCACACTACTGTGTTGGCTAATGTTAGTAGTAATTCTTTAATTGGTTTATTTGTTTTTAAGTGTGCTAAATGGATGTTGTATTATTGTAATGCAAC 

YLNNYVLMAVWVNCIGWLCTCYFGLYlilWVNKI/F 
10401 ATACTTAAACAATTATGTACTAATGGCAGTTATGGTTAACTGCATTGGCTGGCTCTGCACTTGTTACTTTGGGTTGTATTGGTGGGTTAATAAGGTTTTT 

GLTLGKYIMFKVSVDQYRYnCLHKINPPKTVWEUF 
10501 GGTTTAACCTTAGGTAAATACAATTTTAAAGTTTCAGTAGATCAATATAGGTATATGTGTTTGCACAAGATAAACCCACCTAAAACTGTGTGGGAAGTCT 

STNILIQGIGGORULPIATVQAKLSDVKCTTVV 
10601 TTTCGACAAATATACTTATACAAGGAATTGGTGGTGACCGTGTGTTGCCTATTGCTACAGTTCAAGCTAAATTGAGTGATGTAAAGTGTACAACTGTTGT 

LMQLLTKLNVEANSKNHVYLVELHNKILASDDV 
10701 TTTAATGCAGCTTTTGACTAAGCTTAATGTTGAAGCAAATTCAAAAATGCATGTTTATCTTGTTGAGTTACACAATAAAATTCTTGCTTCTGATGATGTT 

GECFiONLLGWLITLFCIDSTIOLSEYCODILKRS 
10801 GG AGAGTGCATGGAT AATTTGTTGGGT ATGCTTAT AACACT ATTTTGT ATAGATTCT ACT ATTGATTTGAGTGAGT ATTGTGATGAC AT ACTT AAGAGGT 

TVLQSVTQEFSHIPSYAEYERAKNLYEKVLVDS 
10901 CAACTGTATTACAATCGGTTACTCAAGAATTCTCACATATACCCTCTTATGCTGAATATGAAAGGGCTAAGAATCTTTATGAAAAGGTTTTAGTTGATTC 

KNGCVTQQELAAYRKAANIAKS1/FDRDLAVQKK 
11001 TAAAAATGGTGGTGTTACACAGCAAGAGCTTGCTGCATATCGTAAAGCTGCCAATATTGCAAAGTCAGTTTTTGATAGAGACTTGGCTGTCCAAAAGAAG 

LDSMAERAMTTMYKEARVTDRRAKLUSSLHALLF 
11101 TTAGATAGCATGGCAGAGCGTGCTATGACAACAATGTATAAAGAGGCGCGTGTAACAGATAGACGAGCAAAATTAGTCTCATCACTACATGCGTTACTTT 

snlkkioseklnvlfdqassgvvplatupivcs 

11201 TCTCAATGCTTAAGAAAATAGATTCTGAAAAGCTTAATGTCTTGTTTGACCAGGCTAGTAGTGGTGTTGTGCCCCTAGCGACTGTTCCAATTGTTTGTAG 

NKLTL\/IPDPETWVKCVEG\/H\/TYSTV\/(i!NIOT 
11301 TAATAAGCTTACACTTGTAATACCAGACCCAGAAACGTGGGTCAAGTGTGTGGAAGGTGTGCATGTTACATATTCAACAGTTGTTTGGAATATAGACACT 

VIDADGTELHPTSTGSGLTYCISGANIAtaJPLKUN 
11401 GTTATTGATGCCGATGGCACAGAGTTACACCCAACTTCTACAGGTAGTGGATTGACATACTGTATAAGTGGTGCTAATATAGCATGGCCTTTAAAGGTTA 

LTRNGHNKVDVVIQNNELNPHGVKTKACVAGVD 
11501 ACTTGACTAGGAATGGGCATAATAAGGTTGATGTTGTTTTGCAAAATAATGAGCTTATGCCACATGGTGTTAAAACAAAGGCTTGCGTAGCAGGTGTAGA 

QAHCSVESKCYYTNISGNSVVAAITSSNPNLKI/ 
11601 TCAAGCACATTGT AGCGT AGAGTCTAAATGTTATTAT ACAAATATTAGTGGCAATTCAGTTGT AGCTGCT ATTACTTCTTCAAATCCAAATCTGAAAGT A 

ASFLNEAGNQIYVDLDPPCKFGNKVGVKVEVVYL 
11701 GCTTCGTTTTTGAATGAGGCAGGCAATCAGATTTATGTAGACTTAGACCCACCATGTAAATTTGGCATGAAAGTGGGTGTCAAGGTTGAGGTTGTTTACT 

YFIKNTRSIVRGNVLGAISNyvyLQSKGHETEE 
11801 TGTATTTTATAAAGAATACAAGGTCGATTGTTAGGGGTATGGTACTTGGTGCTATATCTAATGTTGTTGTCTTACAGTCTAAAGGGCATGAAACAGAGGA 

VDAVGILSLCSFAU DPADTYCKYVAAGNQPLGN 
11901 AGTGGATGCTGTTGGCATTCTTTCACTATGTTCATTTGCAGTAGATCCCGCGGACACATATTGTAAATATGTGGCAGCAGGTAATCAACCTTTAGGTAAC 

CVKMLTl/HNGSGFAITSKPSPTPDQDSYGGASI/C 
12001 TGTGTTAAAATGTTGACAGTGCATAATGGTAGTGGTTTTGCTATAACTTCAAAGCCAAGTCCTACTCCTGACCAGGATTCTTATGGAGGAGCTTCTGTGT 
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LYCRAHIAHPGSVGNLDGRCQFKGSFVQIPTTE 
12101 GTCTCTATTGTAGAGCACACATAGCACATCCAGGAAGTGTAGGAAATTTAGATGGACGTTGTCAATTTAAAGGTTCTTTTGTGCAAATACCTACTACGGA 12200 

KDPVGFCLRNKUCTUCQCUIGYGCQCOSLRQPK 
12201 GAAAGACCCCGTTGGATTCTGTCTACGTAATAAGGTTTGCACTGTTTGCCAGTGTTGGATTGGTTATGGATGTGAGTGTGATTCACTTAGACAACCAAAA 12300 

5SVQSVACASDFDKNYLNGYGVAVRLG* 

12301 TCTTCTGTTCAATCAGTTGCTGGAGCATCTGATTTTGATAAGAATTATTTAAACGGGTACGGGGTAGCAGTGAGGCTCGGCTGATACCCCTTGCTAGTGG 12400 

PIFQNLKRNCARFQ'E 

12401 ATGTGATCCTGATGTTGTAAAGCGAGCCTTTGATGTTTGTAATAAGGAATCAGCTGGTATGTTTCAAAATTTGAAGCGTAACTGCGCTAGATTCCAGGAA 12500 

IRDTEDGNLEYLDSYFl/UKQTTPSNYEHEKSCYE 
12501 CTACGCGATACTGAAGATGGAAATCTTGAGTATCTTGATTCTTACTTTGTAGTTAAACAAACCACTCCTAGTAATTATGAACATGAAAAATCTTGTTACG 12600 

DLKSEVTADHDFFUFNKNIYNISRQRLTKYTPim 
12601 AAGACTT AAAGTCAGAAGT AACAGCTGACCATGACTTCTTTGTGTTCAAT AAGAACATTTACAAT ATT AGTAGGCAACGGCTT ACT AAAT ATACT ATGAT 12700 

DFCYALRHFOPKDCEVLKEILVTYGCIEDYHPK 
12701 GGACTTCTGCTATGCTTTGAGACATTTCGACCCAAAGGATTGTGAAGTTCTTAAAGAAATACTTGTCACTTATGGTTGTATAGAAGACTATCACCCTAAG 12800 

(iJFEENKDlilYOPIENSKYYVWLAKMGPIVRRALLN 
12801 TGGTTTGAGGAGAAT AAGGATTGGT ACGACCCAAT AGAAAACTCAAAAT ATT ATGTCATGTTGGCT AAAATGGGACCT ATTGT ACGACGTGCTTT ATTGA 12900 

AIEFGNLNVEKGYVGVITLDNQDINGKFYDFGD 
12901 ATGCTATTGAGTTCGGAAACCTTATGGTTGAAAAAGGTTATGTTGGTGTTATTACACTCGATAACCAAGACCTTAATGGCAAATTTTATGATTTTGGTGA 13000 

FqKTAPGAGVPVFDTYYSYNNPIIANTDALAPE 
13001 TTTTCAGAAGACGGCACCTGGTGCTGGTGTTCCTGTTTTTGATACGTATTATTCTTACATGATGCCCATCATAGCCATGACGGATGCTTTAGCACCTGAG 13100 

RYFEYDVHKGYKSYOLLKYDYTEEKQELFQKYFK 
13101 AGGTACTTTGAATATGATGTGCACAAGGGTTATAAATCTTATGATCTCCTCAAGTATGATTATACTGAGGAGAAACAAGAATTGTTTCAGAAGTACTTTA 13200 

YUIDQEYHPNCRDCSODRCLIHCANFNILFSTLI 
13201 AGTACTGGGATCAAGAGT ATCATCCT AACTGCCGTGACTGT AGTGATGACAGGTGTTTGAT ACATTGTGCAAACTTCAATATCTTGTTTTCTACACTT AT 13300 

PQTSFGNLCRKVFVDGVPFIATCGYHSKELGVI 
13301 ACCGCAGACTTCTTTCGGTAATTTGTGTAGAAAAGTTTTTGTTGATGGTGTACCATTTATAGCTACTTGTGGCTATCATTCTAAGGAACTTGGTGTTATT 13400 

NNQDNTNSFSKWGLSQLNQFVGDPALLVGTSNNL 
13401 ATGAATCAAGATAACACCATGTCTTTTTCAAAAATGGGTTTAAGTCAACTCATGCAGTTTGTTGGAGATCCTGCTTTGTTAGTGGGAACATCCAATAATT 13500 

UDLRTSCFSVCALTSGITHQTUKPGHFNKDFYO 
13501 TAGTTGATCTTAGAACGTCTTGTTTTAGTGTTTGTGCGTTAACATCTGGTATTACTCATCAAACGGTAAAGCCAGGTCACTTTAACAAGGATTTCTATGA 13600 

FAEKAGMFKEGSS1PLKHFFYPQTGNAAINDYD 
13601 TTTTGCAGAGAAGGCTGGTATGTTTAAGGAGGGTTCGTCTATACCACTTAAACATTTTTTCTATCCTCAAACTGGTAATGCTGCTATAAACGATTATGAT 13700 

YYRYNRPTNFDICQLLFCIEVTSKYFECYEGGCI 
13701 TATTATCGTTATAACAGGCCTACCATGTTTGACATATGTCAACTTCTATTTTGTTTAGAAGTGACTTCTAAATACTTTGAGTGTTATGAAGGCGGCTGTA 13800 

PASQVVVNNLDKSAGYPFNKFGKARLYYEMSLE 
13801 T ACCAGCT AGCCAAGTTGT AGTT AACAACTT AGAT AAGAGTGCAGGCT ATCCATTT AAT AAGTTTGGAAAAGCCCGCCTCT ATT ATGAAATGAGTCT AGA 13900 

EQDQLFEITKKNVLPTITQfnNLKYAISAKNRAR 
13901 GGAACAGGACCAACTCTTCGAGATTACGAAGAAGAATGTCCTACCCACTATAACTCAAATGAATTTAAAATATGCCATATCCGCGAAAAATAGAGCGCGT 14000 

TVAGVSILSTNTNRQFHQKILKSIVNTRNAS1/VI 
14001 ACAGTGGCAGGTGTGTCTATCCTTTCTACTATGACTAATAGGCAGTTTCATCAGAAGATTCTTAAGTCTATAGTCAACACTAGAAATGCTTCTGTAGTT A 14100 

GTTKFYGGIjJONPILRNLIQGVEDPILrnGWDYPKC 
14101 TTGGAACAACCAAGTTTTATGGCGGTTGGGACAACATGTTGAGAAACCTGATTCAGGGTGTTGAAGACCCAATTCTTATGGGTTGGGATTATCCTAAGTG 14200 

DRAWPNLLRIAASLVLARKHTNCCSUSERIYRL 
14201 TGATAGAGCAATGCCTAATTTGTTGCGTATAGCAGCATCCTTAGTACTTGCTCGCAAACACACTAACTGTTGTAGTTGGTCTGAACGCATTTATAGGTTG 14300 

ynecaqvlsetvlatggiyvkpggtssgdattay 

14301 TATAATGAATGCGCCCAGGTCTTATCTGAAACTGTACTTGCTACAGGTGGTATTTATGTTAAACCTGGTGGCACTAGCAGTGGTGATGCTACTACTGCTT 14400 

AMSUFNIIQATSANVARLLSUITRDIVYDNIKS 
1 4401 ATGCAAACAGTGTTTTTAACATAATACAAGCCACATCTGCTAATGTTGCGCGTCTTTTGAGTGTTATAACGCGTGATATTGTCT ATGATAATATTAAGAG 14500 
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lqyelyqqvyrri/nfdpafvekfysylcknfsl 

14501 CnGCAGTATGAATTGTATCAGCAGGTCTACAGGCGAGTTAATTTTGACCCAGCCTTTGTTGAAAAGTTTTATTCTTACTTATGTAAGAATTTTTCGTTG 

(niLSDDGVUCYNNTLAKQGLVADISGFREVLYYQ 
14601 ATGATCTTGTCTGftCGACGGTGTTGTTTGTTACAACAACACATTAGCCAAACAAGGTCTTGTAGCAGATATTTCTGGTTTTAGAGAGGTTCTCTACTATC 

NNVFMADSKCWl/EPDLEKGPHEFCSQHTNLVEV 
14701 AGAATAATGTTTTTATGGCTGATTCTAAATGTTGGGTTGAACCAGATTTAGAAAAAGGCCCACATGAGTTTTGTTCACAACACACAATGCTAGTGGAGGT 

dgepkyipypdpsrilgacvfvddvdktepvav 
14801 tgatggtgaccctaagtatttgccatacccagacccttcacgcattttgggtgcatgtgtttttgtagatgacgtggataagacagaacctgtggctgtt 

meryialaidayplvhheneeykkvffvllayir 
14901 atggagcgttatatagctcttgccatagatgcttatccactagtacatcatgaaaatgaagagtacaagaaggtattctttgttctccttgcatatatca 

KtYQELSQNNinDYSFVmDIDKGSKFWEQEFYE 
15001 gaaaactctatcaagagctttctcagaatatgcttatggactactcttttgtaatggatatagacaagggtagtaaattttgggaacaggagttctatga 

NPIYRAPTTLQSCGVCVVCNSQTILRCGNCIRKP 
15101 GAATATGTATAGAGCTCCTACGACTTTACAATCTTGTGGCGTTTGTGTAGTTTGTAATAGTCAAACTATACTACGCTGCGGTAATTGTATTCGTAAACCG 

FLCCKCCYOHVMHTDHKNVLSINPYICSQLGCGE 
15201 TTTTTGTGTTGTAAGTGTTGCTATGACCACGTCATGCATACGGACCACAAAAATGTTTTATCTATAAATCCTTATATTTGCTCACAGCTAGGTTGCGGTG 

ADVTKLYLGGf*ISYFCGNHKPKLSIPLVSNGT\/F 
15301 AAGCAGATGTTACTAAATTGTACCTCGGGGGTATGTCGTACTTCTGTGGTAATCATAAACCGAAATTGTCAATACCGTTAGTATCTAATGGTACTGTTTT 

GXYRANCAGSENVDDFNQLATTNIilSIVEPYILA 
15401 TGGAATTTACAGGGCTAATTGTGCTGGTAGTGAAAATGTTGATGATTTTAATCAACTAGCTACTACTAATTGGTCCATTGTCGAACCTTATATTTTAGCA 

nrcsoslrrfaaetvkateelhkqqfasaevrev 

155D1 AATCGCTGTAGTGATTCATTGAGACGTTTTGCTGCAGAGACAGTAAAAGCCACAGAAGAATTACATAAGCAACAATTTGCTAGTGCAGAAGTGCGAGAAG 

FSDRELILSWEPGKTRPPLNRNYVFTGYHFTRT 
15601 TATTCTCAGATCGTGAATTGATTCTATCATGGGAACCAGGAAAAACCAGGCCGCCATTGAATAGAAATTATGTTTTCACAGGTTATCACTTTACAAGAAC 

SKVQLGDFTFEKGEGKDVVYYKATSTAKLSUGO 
15701 TAGTAAGGTGCAGCTTGGTGATTTTACATTTGAAAAAGGTGAAGGTAAGGATGTTGTCTATTATAAAGCAACGTCTACTGCTAAATTGTCTGTAGGAGAC 

ifvltshnvvslvapticpqqtfsrfvnlrpnvh 
15801 atttttgttttaacctcacacaatgttgtttctctcgtagcgccaacattgtgtccacaacaaaccttttctaggtttgtaaatttaagacctaatgtaa 

vpecfunniplyhlugkqkrttuqgppgsgksh 
15901 tggtacctgaatgttttgtaaataacattccactttaccatttagtaggtaaacagaagcgtactacagtacaaggtcctcctggcagtggtaaatccca 

faiglavyfssarvvftacshaavdalcekafk 
16001 ctttgctataggccttgcagtatactttagtagcgctcgtgttgtttttactgcatgttctcatgcagctgttgatgctttatgtgaaaaagcttttaag 

flkvddctrii/pqrttvdcfskfkandtgkkyif 
16101 tttcttaaagttgatgattgcactcgtatagtaccccaaaggactactgtcgattgcttctcaaaatttaaagctaatgacacaggcaaaaagtacattt 

stinalpewscdillvdeuspiltnyelsfingk 
16201 ttagtactattaatgccttgccggaagttagttgtgatattcttttggttgacgaggttagtatgttgaccaattacgaattgtcctttattaatggtaa 

inyqyvvyvgdpaqlpaprtllngslspkdynu 
16301 gataaattaccaatatgttgtgtatgtaggtgatccggctcaattaccggcaccccgcactttacttaatggttcactttctccaaaggattataatgtt 

l/TNLNVCVKPDIFLAKCYRCPKEIVDTVSTLVYD 
16401 GTCACAAACCTTATGGTTTGTGTTAAACCTGATATTTTCCTTGCAAAGTGTTATCGTTGTCCTAAGGAAATTGTAGACACTGTGTCTACTCTTGTTTATG 

gkfiannpesrecfkvivnngnsdvghesgsay 
16501 atggaaagtttattgcaaataacccagaatcacgtgagtgtttcaaggttatagttaataatggcaattctgatgtaggacatgaaagtggttcagccta 

nttqlefvkdfvcrnkqwreaifispynannqr 
16601 caacacaacacaattggaatttgtgaaagactttgtttgtcgcaataaacaatggcgggaagcaatatttatttcaccttacaatgctatgaaccagaga 

AYRNLGLNVQTVDSSQGSEYDYVIFCVTADSQHA 
16701 GCTTACCGTATGCTTGGACTTAATGTTCAAACAGTAGATTCTTCTCAAGGTTCAGAGTATGATTATGTCATCTTCTGTGTTACTGCAGATTCGCAGCATG 

LNINRFNVALTRAKRGILVVNRQRDELYSALKF 
16801 CACTGAATATTAATAGATTTAATGTGGCGCTTACAAGAGCTAAGCGTGGTATACTAGTTGTCATGCGCCAGCGTGATGAATTGTATTCTGCTCTTAAGTT 
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teldsetslqgtglfkicnkefsgvhpayavtt 

16901 TACAGAGCTAGATAGTGAAACAAGTCTGCAAGGTACAGGTTTGTTTAAAATTTGCAACAAAGAATTTAGTGGTGTCCATCCTGCTTATGCAGTCACAACT 

KALAATYKVNDELAALi/Nt/EAGSEITYKHLISLL 
17001 AAGGCTCTTGCTGCAACCTATAAAGTTAATGATGAACTTGCTGCACTTGTTAATGTGGAAGCTGGTTCAGAAATAACATATAAACATCTTATTTCTCTGT 

GF KMSVNVEGCHNMFITRDEAIRNVRGlilVGFDV 
17101 TAGGATTCAAGATGAGTGTTAATGTTGAAGGCTGCCACAACATGTTTATAACACGTGATGAGGCAATCCGCAATGTAAGAGGTTGGGTAGGTTTTGATGT 

eathacgtnigtnlpfqvgfstgadfvvtpegl 
17201 agaagcaacacatgcttgtggcactaacattggtactaacctgccttttcaagtaggtttctctactggtgcagactttgtagtcacgcctgagggactt 

vdtsignnfepvnskappgeqfnhlrvlfksakp 
17301 gtagatacttcaataggcaataattttgagcctgtgaattctaaagcacctccaggtgaacaatttaaccacttgagagtgttatttaaaagtgctaaac 

WHVIRPRIl/QWLADNLCNVSDCVVFt/TWCHGtE 
17401 CTTGGCATGTTATAAGACCAAGGATAGTGCAGATGTTAGCAGACAATCTATGCAACGTTTCAGATTGTGTAGTGTTTGTCACATGGTGTCATGGCCTAGA 

LTTIRYFVKIGKEQVCSCGSRATTFNSHTQAYA 
17501 ACTAACTACTTTGCGCTATTTTGTTAAAATAGGCAAGGAACAAGTTTGTTCTTGTGGTTCTAGAGCTACAACTTTTAATTCTCATACTCAAGCTTATGCT 

cwkhclgfdfvynpllvdiqqwgysgnlqfnhdl 

17601 TGTTGGAAGCATTGTTTGGGTTTTGATTTTGTTTATAACCCACTTCTAGTGGATATTCAACAGTGGGGTTACTCGGGTAACCTACAGTTTAATCATGATT 

HCNVHGHAHVASVDAINTRCLAINNAFCQDVNli) 
17701 TGCACTGTAATGTGCATGGCCACGCTCATGTAGCTTCTGTTGACGCTATAATGACTCGTTGTCTTGCAATTAACAATGCATTTTGTCAAGATGTCAACTG 

DLTYPHIANEDEVNSSCRYLQRFIYLNACUOALK 
17001 GGATTTGACATACXCTCACATTGCAAATGAGGATGAAGTCAATTCTAGTTGTAGATATCTACAACGCATGTATCTTAATGCGTGTGTTGATGCTCTTAAA 

VNVUYOIGNPKGIKCVRRGOVNFRFYDKNPIVRN 
17901 GTTAATGTTGTCTATGATATAGGCAACCCTAAAGGTATTAAATGTGTTAGGCGTGGGGATGTTAATTTTAGATTCTATGATAAGAATCCAATAGTACGCA 

VKQFEYDYNQHKDKFADGLCmFUJNCNUDCYPDN 
18001 ACGTCAAGCAGTTTGAGTATGACTATAATCAGCACAAAGATAAGTTTGCTGATGGTCTTTGTATGTTTTGGAATTGTAATGTGGATTGTTATCCTGATAA 

SLt/CRYDTRNLSVFNLPGCNGGSLYVNKHAFYT 
18101 TTCCTTGGTTTGTAGGTATGACACACGAAATTTGAGTGTGTTTAACCTACCAGGCTGTAATGGTGGTAGTCTGTACGTTAACAAACATGCATTCTACACA 

PKFDRISFRNLKANPFFFYDSSPCETIQVDGVAQ 
18201 CCTAAATTTGACCGCATTAGCTTCCGCAATTTGAAAGCTATGCCATTCTTTTTTTATGACTCATCGCCTTGTGAAACCATTCAAGTGGATGGAGTTGCGC 

DLVSLATKDCITKCNIGGAVCKKHAQMYAEFUT 
18301 AAGACCTTGTGTCTCTAGCTACGAAAGACTGTATCACAAAGTGCAACATTGGTGGCGCTGTTTGTAAGAAACATGCCCAGATGTATGCAGAATTTGTGAC 

SYNAAVTAGFTFHJVTNKLNPYNIWKSF5ALQSI 
18401 TTCTTACAATGCAGCTGTCACAGCTGGCTTTACTTTCTGGGTAACTAATAAACTTAACCCTTATAACTTATGGAAAAGTTTTTCAGCTCTCCAGTCTATC 

DNIAYNMYKGGHYDAIAGEMPTVITGDKVFVIOQ 
18501 GACAATATTGCTTATAATATGTATAAGGGTGGTCATTATGATGCTATTGCTGGAGAAATGCCCACTGTCATAACTGGAGACAAAGTTTTTGTTATTGATC 

GVEKAVFVNQTTLPTSVAFELYAKRNIRTLPNN 
18601 AAGGTGTAGAAAAGGCAGTTTTTGTTAATCAAACAACTCTACCTACATCTGTGGCGTTTGAGCTATATGCAAAGAGAAATATTCGCACACTGCCAAACAA 

RIlKGLGVDUTNGFVIWDYANQTPLYRNTVKVC 
18701 CCGTATTTTGAAAGGTTTAGGTGTAGACGTAACCAATGGATTTGTAATTTGGGATTATGCTAACCAAACACCATTGTATCGTAATACCGTCAAGGTATGT 

AYTDIEPNGIVVLYDDRYGDYQSFLAADNAVLVS 
18801 GCATATACAGATATTGAGCCAAATGGCCTAGTAGTTCTGTATGATGATAGATATGGTGATTACCAGTCTTTTCTTGCTGCTGATAATGCTGTTCTAGTTT 

TQCYKRYSYVEIPSNLLVQNGNPLKDGANLYVY 
18901 CTACACAGTGTTATAAGCGATATTCATACGTAGAAATACCATCTAATTTGCTCGTTCAGAATGGTATGCCATTAAAAGATGGAGCGAACCTGTATGTTTA 

KRVNGAFVTlPNTINTQGRSYETFEPRSDIERO 
19001 TAAGCGTGTTAATGGTGCGTTTGTTACACTACCTAACACAATAAACACCCAGGGTCGAAGTTATGAAACTTTTGAACCTCGTAGTGACATTGAGCGTGAT 

FLANSEESFVERYGKDLGLQHILYGEVDKPQLGG 
19101 TTTCTCGCTATGTCAGAGGAGAGTTTTGTAGAAAGGTATGGTAAAGACTTAGGCCTACAACACATACTGTATGGTGAAGTTGATAAGCCCCAATTAGGTG 

LHTt/IGNYRLLRANKLNAKSVTNSDSDVWQNYF 
19201 GTTT ACACACTGTTAT AGGT ATGT ACAGACTCTTACGTGCGAATAAGTTGAACGCAAAGTCTGTAACTAATTCGGATTCTGATGTCATGCAAAATTACTT 
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vlsdngsykqvctvvdlllddflellrnilkey 

19301 TGTATTGTCGGACAATGGTTCTTACAAGCAAGTGTGTACTGTTGTGGATTTACTGCTTGATGATTTCTTAGAACTTCTTAGAAACATACTTAAGGAGTAT 19400 

GTNKSKVVTVSIDYHSINFNTIiIFEDGSIKTCYPQ 
19401 GGT ACT AAT AAGTC AAA AGTTGT AACAGTGTCAATTGATT ACC AT AGC AT AAATTTT ATGACTTGGTT TGA AGATGGCAGT ATT AAA ACATGTT ATCCAC 19500 

LQSAli/TCGYNNPELYKVQNCVNEPCNIPNYGVG 
19501 AGCTTCAATCAGCATGGACGTGTGGTTATAATATGCCTGAACTTTATAAAGTTCAGAATTGTGTTATGGAACCTTGCAACATTCCTAATTATGGTGTTGG 19600 

ITLPSGILnNVAKYTQLCQYLSKTTICVPHNMR 
19601 AATAACGTTGCCTAGCGGTATTCTTATGAATGTGGCAAAGTATACACAACTTTGTCAATACCTTTCGAAAACAACAATTTGTGTACCGCATAACATGCGA 19700 

VMHFGAGSDKGVAPGSTVLKQWLPEGTLLVDNDI 
19701 GTAATGCATTTCGGAGCAGGAAGCGACAAAGGAGTGGCGCCAGGTAGTACTGTTCTTAAACAATGGCTCCCAGAAGGGACACTCCTTGTCGATAATGATA 19000 

VDYVSDAHVSVLSDCNKYNTEHKFDLVISDNYT 
19001 TTGTAGAC?ATGTGTCTGA TGCACATGTTTCTGTGCTTTC AGAT TGCAATAAATATAATACAGAGCACAAGTTTGATCTTGTGATATCTGATATGTATAC 19900 

DNDSKRKHEGVIANNGNDDUFIYLSSFLRNNLA 
19901 AGATAATGATTCAAAAAGAAAGCATGAAGGCGTGATAGCCAATAATGGCAATGATGACGTTTTCATATATCTCTCAAGTTTTCTTCGTAACAATTTGGCT 20000 

LGGSFAVKVTETSWHEVLYDIAQDCAliJWTfflFCTA 
20001 CTAGGTGGTAGTTTTGCTGTAAAAGTGACAGAGACAAGTTGGCACGAAGTTTTATATGACATTGCACAGGATTGTGCATGGTGGACAATGTTTTGTACAG 20100 

UNA5SSEAFLIGVNYLGA5EKVKVSGKTLHANY 
20101 CAGTGAATGCCTCTTCTTCAGAAGCATTCTTGATTGGTGTTAATTATTTGGGTGCAAGTGAAAAGGTTAAGGTTAGTGGAAAAACGCTGCACGCAAATTA 20200 

IFWRNCNYLQTSAYSIFDVAKFDLRLKATPVVN 
20201 TATATTTTGGAGGAATTGTAATTATTTACAAACCTCTGCTTATAGTATATrTGACGTTGCTAAGTTTGATTTGAGATTGAAAGCAACGCCAGTTGTTAAT 20300 

lkteqktdlvfnlikcgkllvrdvgntsftsdsf 

MLVTPLLLl/TL 

20301 TTGAAAACTGAACAAAAGACAGACTTAGTCTTTAATTTAATTAAGTGTGGTAAGTTACTGGTAAGAGATGTTGGTAACACCTCTTTTACTAGTGACTCTT 20400 
V C T N * 

LCALCSAVLYDSSSYVYYYQSAFRPPSGliiHLQG 
20401 TTGTGTGCACTATGTAGTGCTGTTTTGTATGACAGTAGTTCTTACGTTTACTACTACCAAAGTGCCTTCAGACCACCTAGTGGTTGGCATTTACAAGGGG 20500 


Fig. 2. The sequence of the ‘unique 1 region of mRN A F from the Beaudette strain of IBV. Translations 
of the ORFs are shown in single-letter amino acid code. The amino acid is shown above the first base of 
the appropriate codon. The translation starting at position 20368 is the NH 2 terminus of the spike 
precursor protein. 



Fig. 3. Diagram showing the positions of the main ORFs in the ‘unique’ region of mRNA F. The two 
large ORFs, designated FI and F2 are shown, as well as a small ORF at the 5' end of the genome, and 
the start of the spike precursor gene, which overlaps with F2. 


The second large ORF, F2, extends into the ‘unique' region of mRNA E and in fact overlaps 
the coding sequences for the spike protein gene by 16 amino acids. 

Potential sources of error 

All the sequence information has been confirmed by sequencing Ml3 clones obtained from 
both strands of the DNA. In addition most of it has been sequenced several times from different 
Ml3 clones. The 14 cDNA clones used to obtain the sequence of mRNA F contain, including 
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overlaps, 24765 bases. During the shotgun sequencing of these clones 203113 bases have been 
sequenced, so that each base has, on average, been sequenced 8*2 times. However there are two 
regions we have checked more carefully. The first is at positions 12340 to 12390 where FI ends 
and F2 begins. An error here leading to a frameshift could make the difference between two 
large ORFs and one very large ORF. The second is at position 167 where the very small 11 amino 
acid ORF ends. A frameshifting error here could mean that this first ORF can continue for 
another 77 amino acids until position 397. There are two possible sorts of error. The first is an 
artefact in the sequencing gels leading to a misreading. The sequence on both strands appears 
perfectly clear in both these regions. Both regions have been sequenced using formamide gels, 
high temperature gels, in addition to the use of deoxyinosine triphosphate (Bankier & Barrell, 
1983) or deoxy-7-deazaguanosine triphosphate (Mizusawa et al., 1986) to replace deoxyguano- 
sine triphosphate and cytosine-modified sequence reaction products (Ambartsumyan & Mazo, 
1980) to avoid gel compressions. 

The second potential source of error is either a reverse transcriptase error during the synthesis 
of the cDNA or the occurrence of a mutant RNA molecule from which the cDNA was copied, 
both of which would lead to an incorrect cDNA clone. In the case of position 167 the sequence 
has been obtained from an equivalent clone from the M41 strain of IBV and is identical. In the 
case of the sequence between FI and F2 the sequence has been confirmed from two additional 
independent cDNA clones, by sequencing directly from the double-stranded DNA using an 
oligonucleotide primer (Korneluk et al, 1985). Fig. 4(a) shows the relevant sequence in this 
region and Fig. 4(b) shows a sequencing gel of bases 12333 to 12390 obtained directly from a 
cDNA clone using an oligonucleotide primer. In addition the sequence has been obtained 
directly from the virion RNA using specific oligonucleotide primers at both of these points and 
has confirmed the original gel readings. At positions 12333 to 12390 the sequence has also been 
obtained from virion RNA obtained from the M41 strain of IBV, and the sequence in this region 
is identical. 

Gel compressions are thought to be caused by the presence of hairpin loops in the DNA 
migrating down the gel. Examination of the sequence in these regions shows that there are 
several possibilities for the formation of fairly large hairpins, including for example, at the 
position between FI and F2, the sequence GGGGTA with its exact complement TACCCC 24 
bases further on. At this position (12380), in the region where the reading frame changes 
between FI and F2, the sequence has been determined from ten separate Ml3 clones. It is 
interesting to note that one of these clones gave a different sequence reading in that a CT 
dinucleotide, which appears in the other nine Ml3 readings, was not present. This is unusual as 
normally all independent Ml3 clones agree. It is possible that the secondary structure in this 
region has some effect on the fidelity of copying by polymerases. 

Computer analysis 

Extensive computer analysis has been carried out in an attempt to identify some salient 
features on the bleak landscapes of these large ORFs. Searches for homologies with other viral 
polymerases have been performed using the NBRF protein identification resource (George et 
al ., 1986). Short regions of fairly low homology with several viral polymerases can be identified 
but in general they do not rise significantly above the background of matches with proteins that 
are apparently unrelated. One region, between amino acids 1342 and 1350, has a fairly good 
match (8/9 amino acids) with the nsP2 protein of Sindbis virus, a protein which is known to be 
involved in RNA replication (Strauss & Strauss, 1983). This region also has a match with the la 
protein of brome mosaic virus. These matches are shown in Fig. 5. One of the most interesting 
matches is at the 5' end of the first large ORF. The first 300 amino acids have a low-level but 
extensive homology with the replication initiation protein from Escherichia coli (Germino & 
Bastia, 1982). The homology is statistically significant and it may indicate that this region of the 
polymerase protein is involved in initiation of replication of either the positive or negative 
strands. 

The predicted amino acid sequences of the large ORFs have been compared against 
themselves and against each other to see whether there are any repeats which might represent 
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SLRQPKSSl/QSl/AGASDFDKNYLNGYGVAUR.LG * Y P L 
FT*TTKIFC5ISCWSI*F** ELFKRURG55EARLIPL 

ihldi\iqi\illfnqllehlilirii*tgtg*q*gsadtp 

ATTCACTTAGACAACCAAAATCTTCTGTTCAATCAGTTGCTGGAGCATCTGATTTTGATAAGAATTATTTAAACGGGTACGGGGTAGCAGTGAGGCTCGGCTGATACCCCT 
12290 12300 12310 12320 12330 12340 12350 12360 12370 12380 12390 


ludvilml^seplfifvirnqlvcfki^svtaldsrn 

ASGCDPDUUKRAFDI/CIMKESAGfrciFQNLKRNCARFQE 
C*W|Yl*S*CCKASL*CL**GISUY\/SKFEA*lR*IPG 
GCTAGTGGATGTGATCCTGATGTTGTAAAGCGAGCCTTTGATGTTTGTAATAAGGAATCAGCTGGTATGTTTCAAAATTTGAAGCGTAACTGCGCTAGATTCCAGGAA 
12400 12410 12420 12430 12440 12450 12460 12470 12480 12490 125DD 



(c) 
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BMV SCHRLLVDEAGLLHYGQLLVVAALSKCSQVLAF-GDTEQ-ISFKSRDAGFKLLHGNLQYDRRDV-VHKTYRCPQDVIAAVNLLKRKCGNRDTKY 


IBV SCDILLVDEVSMLTNYELSFINGKINYQYVVYV-GDPAQIPAPRTLLNGSLSPKDYNVVTNLIWCVKPDIFLAKCYRCPKEIVDTVSTLVYDGKFIANNP 


SV A^Ei/L YVDE AFACHAGALLALIAIURPRKKUyLCGDPWQ-CGFFNmQLKUHFMHPEKDICTK-TFYKYISRRCTQPUTAIVSTLHYDGKFKTTNP 

Fig. 5. Comparison between amino acid sequences of brome mosaic virus (BMV), infectious bronchitis 
virus (IBV) and Sindbis virus (SV). The BMV sequences are amino acids 748 to 838 of the la protein. 
The SV sequences are amino acids 785 to 878 of the nsP2 protein, The IBV sequences are amino acids 
1248 to 1356 of F2. A colon shows identical amino acids and a dot shows similar (Kanehisa, 1982) 
amino acids. The dashes in the sequences are blank characters inserted to achieve optimal alignment. 


two separate but similar polymerases. A dot matrix comparison, such as DIAGON (Staden, 
1982 a), reveals no repeats. However several low homology repeats can be detected using the 
program FASTP (Lipman & Pearson, 1985). These are shown on Fig. 6(a) beneath a 
hydrophilicity plot (Kyte & Doolittle, 1982) of the amino acid sequences of FI and F2. Fig. 6(b 
to e) shows the amino acid matches in these regions. The spacing between the repeats marked A 
and B is very similar in both cases, 1157 amino acids in FI and 1183 amino acids in F2. It is 
possible that these represent residual domains of homology between two polymerases which 
were at one time more closely related. The areas marked C and D also show regions of homology. 
The diagram also shows several very hydrophobic regions in the first large ORF which represent 
potential membrane-spanning domains. 

Computer analysis has also detected a homology between the non-coding region at the 5' end 
of the positive strand, and the 5' end of the negative strand (i.e. the reverse complement of the 
non-coding region at the 3' end of the positive strand). This is shown in Fig. 7. These sequences, 
on the positive and negative strands, are approximately the same distance from their 5' ends, 52 
bases and 48 bases [excluding the poly(A) tail] respectively, and may play some role in the 
replication of the positive and negative strands. 

Homology regions 

At position 599 the sequence CTGAACAA occurs. This is identical to the sequence which 
occurs in the 'homology regions’ at the 5' ends of the bodies of mRN As D and E (Boursnell et al. y 
1985 b; Binns et al. y 1985 £>). These sequences are thought to be recognition sites for binding of 
the polymerase/leader complex during the synthesis of the subgenomic RNAs (Baric et al. y 
1983). The same sequence CTGAACAA occurs at position 3293. Neither of these positions are 
known to be situated at the 5' end of an mRN A species as are all the other homology regions. 
We have attempted to determine whether there is some feature of the sequence context 
surrounding these homology regions which sets them apart from homology regions which are 
known to occur at the 5' end of the bodies of mRN As. Accordingly, a consensus sequence has 
been calculated from the sequences surrounding the known homology regions at the ends of 
mRN As A to F, This consensus sequence includes six bases to the left of the core homology 


Fig. 4. (<z) The nucleotide sequence in the region between FI and F2, with a translation in single-letter 
amino acid code of three reading frames. The amino acid is shown above the second base of the 
appropriate codon. Stop codons are marked as asterisks. The frames which are open in FI and F2 are 
underlined and the methionine at the start of F2 is boxed in. ( b ). A DNA sequencing gel obtained by 
sequencing a double-stranded cDNA clone using an oligonucleotide primer. The sequence shown is 
from 12333 to 12390, and is the reverse complement of the sequence shown in (a), (c) The same three 
reading frames as shown in (a) y with a graph for each showing the extent to which that reading frame 
conforms to the codon usage found for the amino acid sequence of FI and F2. The frame which 
conforms best to the F1/F2 codon usage is marked with a series of dots and marked FI or F2. Stop 
codons are marked as short vertical lines along the centre of each frame, and start codons as bars with 
filled-in circles on top. The two stop codons at 12 339 (TAA) and 12382 (TGA) are marked as is the start 
codon at 12459. The program used is the ‘codon usage’ option from ANALYSEQ (Staden, 19846, 
1983 c) and uses the method of Staden & McLachlan (1982). The parameters used were a window length 
of 25 and an output length of 1. (Codon usage analysis from the spike, membrane and nucleocapsid gene 
data gives a very similar result.) 
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(a) 



(b) Repeat A 

F1 484 EFVKTYVCKAQmSiyiLAAULGEDIliJHLVSQVIYKLGyLFTKWOFC-DKHtiiKGFCVQLKRAKLIVTE 

• t 0 0 ••• t f ■ 2 • I • t S • • • • 9 0 9 0 0 9 0 9 9 9 

F2 1387 EFUKDFVCRNKQli)-REAIF-ISPYNAMNQRAYRPILGLNt/QTt/DSSQGSEYDYVIFCVTADSQHALNIN 


FI 

F2 


TFCVLKGVAQHCFQLLLDAIHSLYKSFKKCALGR-IHGDLLF 

• « • • * • • • * • 
S • 9 0 0 0 0 0 0 0 0 00900009 0 0 9 0 0 0 9 0 9 9 

RFNUALTRAKRGILWMRQRDELYSALKFTELDSETSLQGTGLF 


(c) Repeat B 

FI 1630 VKMGDKIGGVTMGLWRAEHLNKPNLERIFNIAKKAIVGSSVVTTQCGKLIGKAATFIADKVGGGVVRNITD 

! ! i i a •» a a • ••• a a a a a a a » a ••• ••• • •• a a • a a a a a a a a 

F2 2570 UKUSGKTLHANYIFIURNCNYLQTSAYSIFDUAKFOLRLKATPVUNLKTEQKTDLVFNLIKCGKLLVRDVGN 

( d ) Repeat C 

FI 3696 VKTKACUAGVDQAHCSVESKCYYTNISGNSWAAITSSNPN-LKVASFLNEAGN—QI 

9 9 9 9 9 0 0 0 0 9 9 » J * * * . ! ! 

0 0 0 9 9 9 0 9 0 0 00 0 99009 9 9 9 

F2 1996 VKPTAYAYWOEA-CLUDDFVNLKYKAATPGKDSA5SAVKCFSVTDFLKKAUFLKEALKCEQI 


(e) Repeat D 

F 1 3438 LFCIDSTIDLSE-YCDDILKRSTVLQSVTQEFSHIPSYAEYERAKNLYEKULDSKNG—GVT 

» • • • 0 2 « 2 S •• • • t • « * I « • ••■**••• 9 9 0 9 9 9 9 0 ••• 

F2 430 LFCLEVTSKYFECYEGGCIPASQVVVNNLDKSAGYP-FNKFGKARLYYEWSLEEQOQLFEIT 

Fig. 6. (a) Hydropathicity plots (Kyte & Doolittle, 1982) of the predicted amino acid sequences of 
ORFs FI and F2, Values above the line are hydrophobic and values below the line are hydrophilic. The 
hydropathicity is calculated using a moving window of 41 amino acids, with a value plotted every 21 
residues. The pairs of bars marked A, B, C and D show regions of partial homology [see Results and ( b) 
to (e)]. (b to e) Amino acid sequences of the matches depicted by the bars in (a). A colon shows identical 
amino acids and a dot shows similar (Kanehisa, 1982) amino acids. The dashes in the sequences are 
padding characters inserted to achieve optimal alignment. 


region CT(T/G)AACAA present in all the regions, the eight bases of the core homology itself, 
and four bases to the right. The consensus has been compared to the complete sequence using 
the computer program FITCONSENSUS (Devereux et ah , 1984). The program successfully 
identifies the known homology regions with scores ranging from 74*6 to 64*1. The 14 next best 
fitting regions identified have a range of scores well separated from those of the known 
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52 TTTAACTTAACAAAACGGACTTAAATACCTACAGCTGGTCCTCATAGGTGTTCCATTGCAGTGCACT 118 

• • >••••«»» t « ••• • i * • t t • • • • • •• •• * • • • • ! • ?! 

• • »•*«••*» • • « i • • « • w ft • « • • « • * * • • • • • 9 • • • • * ♦ * • 

48 TTAAACTTAACTTAA-ACTAAAATT—TAGCTCTTCCCCTAATGGGCGTCCTAGTGCTGTACCCT 109 

Fig. 7. Comparison between (top) the nucleotide sequence of the 5' end of the genome and (bottom) the 

reverse complement of the 3' end of the genome (i.e. the 5' end of the negative strand). Colons show 

identical bases. The dashes in the sequences are padding characters inserted to achieve optimal 
alignment. 


GASDFDKNYLNGYGVAVRLG* 

GGAGCATCTGATTTTGAT AAGAATT ATTT AAACGGGT ACGGGGT AGCAGTGAGGCTCGGCTGAT ACCCCTTGCT AGTG 

• • « • « c • « • • • • • a « » 2 S ! 

■ « « « « § « a 0 • 9 a • « * • ■ • 1 

GT AGCT ATGGTT AGAGGGAGTATCCTAGGAAGAGATTGTCTGCAGGGCCTAGGGCTCCGCTTGACAAATTTATAGGGA 
VANVRGSILGRDCLQGLGLRLTNL* 

Fig. 8. Nucleotide and predicted amino acid sequences where ribosomal frameshifting may occur. The 
top sequence is at the FI/F2 junction of IBV, and the bottom sequence is at the gag/pol junction of Rous 
sarcoma virus. Colons show identical bases. 


homology regions, with a tight cluster of scores (53-6 to 58*8). The CTGAACAA sequence at 
position 599 scores even lower. It seems probable, therefore, that the two CTGAACAA 
sequences at 599 and 3293 are chance matches with the core sequence, but when surrounding 
sequences are taken into account the differences are enough to ensure that they are not major 
sites for the binding of the leader/polymerase complex. 

DISCUSSION 

The 20 500 bases of sequence presented in this paper complete the sequence of the Beaudette 
strain of avian infectious bronchitis virus, the type species of the Coronaviridae. The complete 
sequence, excluding the poly(A) tail at the 3' end, is 27 608 residues. This is somewhat larger than 
the previously estimated size of the viral RNA which had been put at 20 to 24 kilobases 
(Lomniczi & Kennedy, 1977). The sequence of the 'unique’ regions of mRNAs A, B, C, D and E 
have already been published, covering some 8 kilobases at the 3' end of the genome and 
including the genes for the major structural proteins of the virus. The 20 kilobases at the 5" end of 
the viral RNA constitutes the ‘unique’ region of mRNA F, the genome-sized RNA. This is 
thought to code for a polymerase or polymerases which carry out all the necessary replication 
and transcription functions of the virus. 

Sequence analysis shows that the main part of the ‘unique’ region of mRNA F appears to 
contain two large ORFs. Because of the importance of determining whether there are one or two 
ORFs, we have considered the possibility that mRNA F in fact contained one very large ORF, 
and that a sequencing error or a mutant cDN A clone had led to a frameshift. Because of this the 
sequence in the region between the two ORFs has been checked exceedingly carefully. The 
relevant sequence is shown, with translations in the three reading frames, in Fig. 4(a). Any 
frameshift error must occur within 43 bases between positions 12341 and 12383. Two 
independent cDNA clones and direct RNA sequences from virion RNA give the same result. 
There are no obvious signs of sequence artefacts such as compressions, and indeed several gel 
systems and sequencing methods which could resolve compressions (see Methods and Results) 
do not show any change in the sequence. Fig. 4(b) shows a sequencing gel representing this 
region, obtained by sequencing a cDN A clone directly using an oligonucleotide primer. It can be 
seen that the sequence appears clear and unambiguous. Unless, therefore, there is some singular 
form of unresolvable and undetectable sequencing artefact, we must accept that the sequence 
here is correct. 

The problem now arises as to how translation of the second ORF, F2, is achieved. No mRNA 
has been detected at this point, and no homology region which might suggest the presence of one 
can be seen in the RNA sequence (see Results). It is possible that the ribosomes, having 
completed translation of the first ORF, FI, reinitiate translation at the first AUG of F2, or that 
internal initiation occurs, as appears to be the case with the phosphoprotein mRNA of vesicular 
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stomatitis virus (Herman, 1986). There is however one piece of evidence that suggests that 
neither of these alternatives is the case. If the second ORF is genuinely a separate gene, then the 
70 or so bases preceding its initiation codon should be non-coding sequences, comparable to the 
5' non-coding sequences preceding other IBV genes. In fact, if translated, they exhibit a heavy 
codon bias (Staden & McLachlan, 1982; Staden, 1984 c) similar to the bias found in other IBV 
genes. This is shown graphically in Fig. 4(c) where it can be seen that the frame with typical IBV 
codon bias switches from that of FI to that of F2 exactly at the point where the ORF changes. 
This strongly suggests that the sequences before the AUG of F2 have a coding function. One 
way to resolve this problem is to postulate that on some occasions, during translation of mRNA 
F, a ribosome slippage occurs, which introduces a frameshift and allows translation to continue 
unhindered from FI into F2. Ribosomal frameshifting has been described in bacteriophage 
(Kastelein etal , 1982), prokaryotic (Atkins etal, 1972) and eukaryotic (Fox & Weiss-Brummer, 
1980; Jacks & Varmus, 1985) systems. Such a mechanism could be conceived in the case of IBV 
as a form of translational control designed to provide coordinated expression of two 
polymerases, with the protein from the first gene being produced at a higher level than that from 
the second gene. In the case of Rous sarcoma virus (Jacks & Varmus, 1985) expression of the pol 
gene requires a frameshift by the ribosome. Some well-controlled work by these authors, using 
cell-free translation systems, has demonstrated that the frameshifting is sequence-specific. 
Moreover it occurs ten times more efficiently in a eukaryotic system than in a prokaryotic 
system, indicating that there are specific eukaryotic signals to which the prokaryotic system 
responds poorly. The region of sequence responsible for the frameshifting has been narrowed 
down to 24 nucleotides. Both IBV and Rous sarcoma virus require a shift into the — 1 frame to 
occur, and it may be that similar frameshifting signals are present in both sequences. 
Accordingly the 24 nucleotides of Rous sarcoma virus sequence have been compared to the 43 
nucleotides of IBV sequence within which any frameshift must occur (see Fig. 4a). Interestingly 
a match of 8/9 nucleotides can be found, both sequences occurring in the same frame and both 
within 20 bases of the termination codon (see Fig. 8). Further work will be needed to determine 
whether this sequence forms part of any signals which may promote ribosomal frameshifting. 

For each of the other IBV mRNAs, the first AUG to occur after the homology region either is 
used to initiate synthesis of a protein, as is the case for the spike and membrane proteins (Binns 
et al , 19856; Boursnell et a/., 1984), or is present at the start of a reasonable sized ORF which 
could code for a polypeptide of 7K or more. Thus it is surprising to find the first AUG, at 
position 131, at the start of a small, 11 amino acid, ORF. The sequence context around this first 
AUG does not conform to Kozak's consensus for functional initiation codons whereas the 
context round the second AUG does. A similar small ORF of 12 amino acids occurs at the 5' end 
of RNA 1 of alfalfa mosaic virus (Cornelissen et a!., 1983), an RNA species encoding a 115K 
product thought to be involved in RNA replication. In this case also only the second AUG 
conforms to the Kozak consensus. Both these cases suggest the possibility that the ribosomes can 
bypass the first, non-functional, AUG and initiate translation at the second. It is likely that this 
also occurs in mRNA D of IBV to allow translation of the second and third ORFs (Boursnell et 
al, 1985 b). 

It is not known for coronaviruses whether the sequences at the 5' end of the genome produce a 
polyprotein which is subsequently cleaved into separate proteins, as is the case for alphaviruses 
(Strauss et ai , 1984), or whether the viral polymerase acts as an extremely large multifunctional 
enzyme. Whether or not it is cleaved post-translationally into separate proteins, such an enzyme 
would need to perform several functions. First it must synthesize the negative-stranded 
template. From this template it must synthesize the leader sequence and then the subgenomic 
mRNAs, for which it needs the ability to recognize highly conserved signal sequences (Baric et 
al , 1983, 1985; Spaan et al , 1983; Brown & Boursnell, 1984), a capping ability (Lai et al , 1982) 
and probably the ability to reinitiate transcription at these points (Lai et al , 1985; Makino et al, 
1986). If it is cleaved into separate proteins it may encode a protease function to do this. Two 
polymerase activities, early and late, have been identified in MHV-infected cells (Brayton et al, 
1982). These have different ionic requirements and different pH optima. Both polymerase 
activities are associated with two different membrane fractions, a light fraction which appears 
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to synthesize positive-stranded genome-size RNA and a heavy fraction which also synthesizes 
subgenomic RNAs (Brayton et ai , 1984). Some evidence for two polymerase-coding genes can 
be found in the nucleotide sequence of mRNA F, in that there are small regions of residual 
homology between the predicted amino acid sequences of FI and F2 (see Results and Fig. 6). 

The question of whether the cDNA clones sequenced in this study might derive from mutant, 
non-viable RNA molecules is an interesting one. The error rate of RNA polymerases is fairly 
high (Steinhauer & Holland, 1986) and many of the RNA molecules in an infected cell may be 
different from that in the original infecting virus. If the mutation rate is 1 in 10000 then over the 
20 kilobases of sequence presented here, there may be one or two changes each time one strand 
was copied into another. While the viral RNA is replicating within the cell, it is likely that 
mutant, and possibly defective, virion RNA molecules will accumulate with little selection 
against them, and, unless they have gross structural defects, most of them will be packaged into 
virions. It is these virions, without any further selection for viability, which are used to extract 
the RNA which is used to synthesize cDNA. In addition the infecting virus will be a mixture of 
different RNA molecules, even though it has been plaque-purified. However, be that as it may, 
there is no evidence for very high mutation rates in the cDNA clones which we have sequenced 
here. For the clones covering the 20 kilobases there are 4659 bases of overlap between separate, 
independent clones (all made from the same RNA preparation). In the overlap regions there was 
not one difference, there being 100% agreement between the sequences from adjacent clones. 

This is in contrast to results found by Schubert et ai (1984) while sequencing the polymerase 
gene of vesicular stomatitis virus. The gene spans 6380 nucleotides and each region was 
sequenced from approximately three cDNA clones, giving 19140 nucleotides of overlap. In 
these 19140 nucleotides they found 20 nucleotide changes, including four insertions or deletions, 
giving an overall mutation rate of approximately 10" 3 . In the 9318 (4659 x 2) nucleotides of 
IBV cDNA clones which can be checked on another clone, there were no changes. Over 9318 
nucleotides a mutation rate of 3-2 x 10 -4 would give a 95 % probability of at least one 
nucleotide change; thus, since there were no changes, the overall mutation rate is probably lower 
than this. Given the number of rounds of replication which will have occurred between the 
original plaque isolation and the production of the cDNA clones, the mutation rate per base 
incorporated is likely to be considerably lower than this. It is interesting to speculate on the 
disparity between the vesicular stomatitis virus and the IBV results in this case, and on whether 
the (presumably) very large IBV polymerase, or polymerases, has a lower intrinsic error rate 
than the VSV polymerase. 

Sequencing of cDNA clones from the ‘unique’ region of mRNA F has revealed the rather 
unexpected presence of two large ORFs. Although the sequence in the region between these has 
been obtained from three independent cDNA clones and from the virion RNA, the possibility 
of some bizarre form of sequence artefact cannot be totally discounted. It will be interesting to 
see if a similar frameshift occurs in an equivalent position in the coronavirus MHV genome. 
Experiments can now be designed to confirm the reading frame switch by other means. For 
example in vitro translation of SP6 polymerase transcripts from this region can be performed and 
the sizes of the products determined. Although no mRNA has been detected with a 5' end near 
the beginning of the second ORF, a search for a low abundance mRNA species can now be 
carried out by primer extension from mRNA preparations. In addition, the availability of 
sequence data from the IBV polymerase(s) allows antisera to be raised against products 
expressed from selected parts of the sequence. These will prove useful in determining the fate of 
the large polypeptides predicted from the nucleotide sequence, showing whether post- 
translational cleavage occurs, and attempting to unravel the relationship between the various 
polymerase activities which have been detected in coronavirus-infected cells. 
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